347 research outputs found

    Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

    Get PDF
    Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

    Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

    Get PDF
    Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators

    Chronic defensiveness and neuroendocrine dysfunction reflect a novel cardiac troponin T cut point: The SABPA study.

    Get PDF
    Background: Sympatho-adrenal responses are activated as an innate defense coping (DefS) mechanism during emotional stress. Whether these sympatho-adrenal responses drive cardiac troponin T (cTnT) increases are unknown. Therefore, associations between cTnT and sympatho-adrenal responses were assessed. Methods: A prospective bi-ethnic cohort, excluding atrial fibrillation, myocardial infarction and stroke cases, was followed for 3 years (N=342; 45.6±9.0 years). We obtained serum high-sensitive cTnT and outcome measures [Coping-Strategy-Indicator, depression/Patient-Health-Questionnarie-9, 24h BP, 24h heart-rate-variability (HRV) and 24h urinary catecholamines]. Results: cTnT levels of the cohort remained similar over 3 years but recovery to cTnT-negative levels was higher in Blacks. Blacks showed moderate depression (45% vs. 16%) and 24h hypertension (67% vs. 42%) prevalence compared to Whites. A receiver-operating-characteristics cTnT cut-point 4.2 ng/L predicting hypertension in Blacks was used as binary exposure measure in relation to outcome measures [AUC 0.68 (95% CI 0.60-0.76); sensitivity/specificity 63/70%; P≤0.001]. In cross-sectional analyses, elevated cTnT was related to DefS [OR 1.08 (95% CI 0.99-1.16); P=0.06]; 24h BP [OR 1.03-1.04 (95% CI 1.01-1.08); P≤0.02] and depressed HRV [OR 2.19 (95% CI 1.09-4.41); P=0.03] in Blacks, but not in Whites. At 3 year follow-up, elevated cTnT was related to attenuated urine norepinephrine:creatinine ratio in Blacks [OR 1.46 (95% CI 1.01-2.10); P=0.04]. In Whites, a cut point of 5.6 ng/L cTnT predicting hypertension was not associated with outcome measures. Conclusion: Central neural control systems exemplified a brain-heart stress pathway. Desensitization of sympatho-adrenal responses occurred with initial neural- (HRV) followed by neuroendocrine dysfunction (norepinephrine:creatinine) in relation to elevated cTnT. Chronic defensiveness may thus drive the desensitization or physiological depression, reflecting ischemic heart disease risk at a 4.2 ng/L cTnT cut-point in Blacks

    Experimental evidence indicating that mastreviruses probably did not co-diverge with their hosts

    Get PDF
    Background. Despite the demonstration that geminiviruses, like many other single stranded DNA viruses, are evolving at rates similar to those of RNA viruses, a recent study has suggested that grass-infecting species in the genus Mastrevirus may have co-diverged with their hosts over millions of years. This "co-divergence hypothesis" requires that long-term mastrevirus substitution rates be at least 100,000-fold lower than their basal mutation rates and 10,000-fold lower than their observable short-term substitution rates. The credibility of this hypothesis, therefore, hinges on the testable claim that negative selection during mastrevirus evolution is so potent that it effectively purges 99.999% of all mutations that occur. Results. We have conducted long-term evolution experiments lasting between 6 and 32 years, where we have determined substitution rates of between 2 and 3 × 10 -4substitutions/site/year for the mastreviruses Maize streak virus (MSV) and Sugarcane streak Réunion virus (SSRV). We further show that mutation biases are similar for different geminivirus genera, suggesting that mutational processes that drive high basal mutation rates are conserved across the family. Rather than displaying signs of extremely severe negative selection as implied by the co-divergence hypothesis, our evolution experiments indicate that MSV and SSRV are predominantly evolving under neutral genetic drift. Conclusion. The absence of strong negative selection signals within our evolution experiments and the uniformly high geminivirus substitution rates that we and others have reported suggest that mastreviruses cannot have co-diverged with their hosts. © 2009 Harkins et al; licensee BioMed Central Ltd

    Evolutionary distances in the twilight zone -- a rational kernel approach

    Get PDF
    Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

    Mitochondrial DNA control region data from indigenous Angolan Khoe-San lineages

    Get PDF
    Here we provide 129 complete mitochondrial control region sequences of indigenous Khoe-San individuals from Angola to contribute to the still underrepresented pool of data from Africa. The dataset consists of exclusively African lineages with a majority of Sub-Saharan haplogroups. The probability of a random match was calculated as 0.09. The data set comprises 21 haplotypes occurring more than once and 17 unique haplotypes. Upon publication, haplotypes were incorporated in the EMPOP database (www.empop.org; EMP00069) [1].http://www.elsevier.com/locate/fsi

    CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

    Get PDF
    Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

    Molecular evolution of HoxA13 and the multiple origins of limbless morphologies in amphibians and reptiles

    Get PDF
    Developmental processes and their results, morphological characters, are inherited through transmission of genes regulating development. While there is ample evidence that cis-regulatory elements tend to be modular, with sequence segments dedicated to different roles, the situation for proteins is less clear, being particularly complex for transcription factors with multiple functions. Some motifs mediating protein-protein interactions may be exclusive to particular developmental roles, but it is also possible that motifs are mostly shared among different processes. Here we focus on HoxA13, a protein essential for limb development. We asked whether the HoxA13 amino acid sequence evolved similarly in three limbless clades: Gymnophiona, Amphisbaenia and Serpentes. We explored variation in ω (dN/dS) using a maximum-likelihood framework and HoxA13sequences from 47 species. Comparisons of evolutionary models provided low ω global values and no evidence that HoxA13 experienced relaxed selection in limbless clades. Branch-site models failed to detect evidence for positive selection acting on any site along branches of Amphisbaena and Gymnophiona, while three sites were identified in Serpentes. Examination of alignments did not reveal consistent sequence differences between limbed and limbless species. We conclude that HoxA13 has no modules exclusive to limb development, which may be explained by its involvement in multiple developmental processes

    Recent acquisition of Helicobacter pylori by Baka Pygmies

    Get PDF
    Both anatomically modern humans and the gastric pathogen Helicobacter pylori originated in Africa, and both species have been associated for at least 100,000 years. Seven geographically distinct H. pylori populations exist, three of which are indigenous to Africa: hpAfrica1, hpAfrica2, and hpNEAfrica. The oldest and most divergent population, hpAfrica2, evolved within San hunter-gatherers, who represent one of the deepest branches of the human population tree. Anticipating the presence of ancient H. pylori lineages within all hunter-gatherer populations, we investigated the prevalence and population structure of H. pylori within Baka Pygmies in Cameroon. Gastric biopsies were obtained by esophagogastroduodenoscopy from 77 Baka from two geographically separated populations, and from 101 non-Baka individuals from neighboring agriculturalist populations, and subsequently cultured for H. pylori. Unexpectedly, Baka Pygmies showed a significantly lower H. pylori infection rate (20.8%) than non-Baka (80.2%). We generated multilocus haplotypes for each H. pylori isolate by DNA sequencing, but were not able to identify Baka-specific lineages, and most isolates in our sample were assigned to hpNEAfrica or hpAfrica1. The population hpNEAfrica, a marker for the expansion of the Nilo-Saharan language family, was divided into East African and Central West African subpopulations. Similarly, a new hpAfrica1 subpopulation, identified mainly among Cameroonians, supports eastern and western expansions of Bantu languages. An age-structured transmission model shows that the low H. pylori prevalence among Baka Pygmies is achievable within the timeframe of a few hundred years and suggests that demographic factors such as small population size and unusually low life expectancy can lead to the eradication of H. pylori from individual human populations. The Baka were thus either H. pylori-free or lost their ancient lineages during past demographic fluctuations. Using coalescent simulations and phylogenetic inference, we show that Baka almost certainly acquired their extant H. pylori through secondary contact with their agriculturalist neighbors

    Modeling HIV-1 Drug Resistance as Episodic Directional Selection

    Get PDF
    The evolution of substitutions conferring drug resistance to HIV-1 is both episodic, occurring when patients are on antiretroviral therapy, and strongly directional, with site-specific resistant residues increasing in frequency over time. While methods exist to detect episodic diversifying selection and continuous directional selection, no evolutionary model combining these two properties has been proposed. We present two models of episodic directional selection (MEDS and EDEPS) which allow the a priori specification of lineages expected to have undergone directional selection. The models infer the sites and target residues that were likely subject to directional selection, using either codon or protein sequences. Compared to its null model of episodic diversifying selection, MEDS provides a superior fit to most sites known to be involved in drug resistance, and neither one test for episodic diversifying selection nor another for constant directional selection are able to detect as many true positives as MEDS and EDEPS while maintaining acceptable levels of false positives. This suggests that episodic directional selection is a better description of the process driving the evolution of drug resistance
    corecore